Summarine

Language Evolution

p. 1

Language evolution: the hardest problem in science?


Steven Pinker


language evolution, a recent field

  • 1866: ban on theories about origins of language → too much speculation
  • 1975 (!): first serious conference about the topic
  • 1990s: inception of language evolution as a serious field
p. 12

Consensus and remaining controversies


Consensus


1. interdisciplinarity

  • language evolution must be approached from many disciplines
p. 13

2. growing interest in mathematical and computational modelling

  • 👁 ↓

Models are useful because they allow researchers to test particular theories about the mechanisms underlying the evolution of language. Given the number of different factors that may potentially influence language evolution, our intuitions about their complex interactions are often limited. It is exactly in these circumstances, when multiple processes have to be considered together, that modelling becomes a useful – and perhaps even necessary – tool.

In this book, modelling work has been used to inform theories about biological adaptations for grammar (Pinker, Dunbar, Briscoe, Komarova and Nowak, Chapters 2, 12, 16, and 17), about the emergence of language structure through cultural transmission (Hurford, Deacon, Kirby and Christiansen, Chapters 3, 7, and 15), and about the evolution of phonetic gesture systems (Studdert-Kennedy and Goldsten, Chapter 13).

We envisage that the interest in mathematical and computational modelling is likely to increase even further, especially as it becomes more sophisticated in terms of both psychological mechanisms and linguistic complexity.

3. pre-adaptations necessary

  • before language, there was a precursor
  • perhaps ‘the ability to use symbols’
p. 14

Disagreement


1. constraints on language structure

  • consequence of biological adaptations, or products of transmission?

… no further controversies explicitly mentioned

p. 15

Language itself is rather difficult to define, existing as it does both as transitory utterances that leave no trace, and as patterns of neural connectivity in the natural world’s most complex brains. It is never stationary, changing over time and within populations which themselves are dynamic. It is infinitely flexible and (almost) universally present. It is by far the most complex behaviour we know of […].

p. 16

Language as an adaptation to the cognitive niche

Introduction

central theory

  • human language faculty is a complex biological adaptation that evolved by natural selection for communication in a knowledge-using, socially interdependent lifestyle

The design of human language

p. 17

Words

mental lexicon

  • a finite memorized list of words

word

  • an arbitrary sign a connection between a signal and a concept, shared by members of the community

bidirectionality of symbols

  • ‘if I can use a word, I can understand it when someone else uses it, and vice versa’

Grammar

grammar

  • combining words into larger words, phrases and sentences
  • (the further characterisation of grammar is decisively Chomskyan, nvda.)
p. 21

Interfaces of language with other parts of the mind

parts of the mind interfacing with language

  • grammar
  • perception
  • articulation
  • conceptual knowledge (provides the meanings of words and their relationships)
  • social knowledge (how language can be used and interpreted in a social context)

nvda. Cognitive Grammar assumes many of these aspects are NOT separated and part of the same structure.

Is language an adaptation?

adaptation

  • a trait whose genetic basis was shaped by natural selection

Is language a distinct part of the human phenotype?

language as general cognitive abilities

  • the combination of many cognitive abilities lead to language facility
p. 22-23

(arguments for a language faculty)

p. 23

Did language evolve by means other than natural selection?

theory

  • language evolved ‘all at once’ as the product of a macromutation
  • bruh
p. 26

What did language evolve for?

p. 27

The cognitive niche

the cognitive niche

  • the unified explanation of the many human traits that are unusual in the rest of the living world
    • e.g. children, family, community, culture …
    • ⇒ language at the centre
p. 29

Alternatives to the cognitive niche theory

1. deception
(Dawkins and Krebs)

  • language evolved to deceive and manipulate other people
  • but: this doesn’t make sense communicatively → no raison d’être for language
p. 30

2. thinking

  • language evolved to allow us to think rather than to communicate
  • “it is impossible to think at human levels of complexity without a representational medium” (Bickerton 1990)
  • but: this assumes a strong Whorfian view on cognition, skips grammar

3. grooming

  • (from Freek’s class) language as social alternative for grooming in larger groups (Dunbar 1998)

4. courtship device

  • language to advertise the fitness of our brains (Miller 2000)

New tests of the theory that language is an adaptation

p. 31

Language and evolutionary game theory

Evolutionary game theory has allowed biologists to predict how organisms ought to interact with other organisms co-evolving their own strategies (Maynard Smith 1982). Language, like sex, aggression and cooperation, is a game it takes two to play, and game theory can provide the external criteria for utility enjoyed by the rest of evolutionary biology.

Modellers assume only that the transmission of information between partners provides them with an advantage (say, by exchanging information or coordinating their behaviour), and that the advantage translates into more offspring, with similar communicative skills.

The question then is how a stable communication system might evolve from repeated pairwise interactions and, crucially, whether such systems have the major design features of human language.

1. Hurford (1989)

  • arbitrary, bi-directional sign will drive out other schemes over time
p. 31-21

2. Nowak et al. (1999, 2000)

  • errors in signalling or perception are inevitable, especially when signals are physically similar
  • usefulness of syntax to reduce memory load
p. 33

Language and molecular evolution

Mathematical models and computer simulations can show that the advantages claimed for some features of language really can evolve by known mechanisms of natural selection. These models cannot, of course, show that language in fact evolved according to the proposed scenario.

(gene things I don’t understand)

p. 38

The language mosaic and its evolution


James R. Hurford


p. 40

key to explaining complex phenomena of homan language

  • understanding how they could have evolved from less complex phenomena
Dichotomy in language evolution
biological evolution historical evolution
of the language capacity of individual languages
p. 41

Biological steps to language-readiness: pre-adaptations

pre-adaptation

  • a change in a species which is not itself adaptive (= selectively neutral)
  • but: paves the way for subsequent adaptive changes

For example, bipedalism set in train anatomical changes which culminated in the human vocal tract. Though speech is clearly adaptive, bipedalism itself it not an adaptation for speech; it is a pre-adaptation.

  1. pre-phonetic capacity to perform speech sounds or manual gestures
  2. pre-syntactic capacity to organise longer sequences of sounds or gestures
  3. pre-semantic capacities
    • to form basic concepts
    • to construct more complex concepts (e.g. prepositions)
    • to carry out mental calculations over complex concepts
  4. pre-pragmatic capacities
    • to infer what mental calculations others can carry out
    • to act cooperatively
    • to attend to the same external situations as others
    • to accept symbolic action as a surrogate for real action
  5. an elementary symbolic capacity to link sounds or gestures arbitrarily with basic concepts, such that perception of the action activates the concept, and attention to the concept may initiate the sound or gesture

(this list can be used as a checklist for the computer models to highlight was and wasn’t been modelled)

Cultural evolution of languages

p. 50

The two-phase nature of language transmission

p. 51
historical linguistics language evolution
uniformitarianism simple to complex
same principles then as now complex principles have arisen out of simple principles

Grammaticalisation

grammaticalisation

  • syntactic organisation, and the overt markers associated with it, emerges from non-syntactic, principally lexical and dicrouse, organisation
  • spiralling interaction of the two phases of language’s existence, I-language and E-language

This is still a very formal way of looking at things. Still, this was 2003, so cut the man a little slack.

p. 52

unidirectionality

  • the general trend of grammaticalisation processes is all in one direction
p. 53

Computer modelling of language evolution

computer modellers of emerging language

  • start from simulated populations with no language at all
  • simulations can lead to interesting results in which the populations have converged on ordinated communicative codes which, though still extremely simple, share noteworthy characteristics with human language
p. 54
  • Batali (1998; 2002)
  • Kirby (200; 2002)
  • Hurford (2000)
  • Teal and Taylor (1999)
  • Tonkes and Wiles (2002)

A survey of some of these works, analysing their principle dimensions, and the issues they raise, appears in Hurford (2002).

iterated learning models

  • do not attempt ‘to put everything together’ and reach a full language-like outcome
  • rather: explore interactions between pairs of strictly isolated factors relevant to the iterated learning model (e.g. Brighton and Kirby 2001)
p. 55

This strand of computational simulation research has the potential to clarify the essentials of the interaction between (a) the psychological capacities of language learners and (b) the historical dynamics of populations of learners giving rise to complex grammars resembling the grammars of real natural languages.

In such simulations, a population of agents begins with no shared system of communication. The agents are ‘innately’ endowed with certain competencies, typically including control of a space of possible meanings, an inventory of possible signals, and a capacity for acquiring grammars of certain specified sorts on exposure to examples of meaning-signal pairs.

The simulations typically proceed with each generation learning from its predecessor, on the basis of observation of its communicative behaviour. At first, there is no coherent communicative behaviour in the simulated population. Over time, a coherent shared syntactic system emerges. The syntactic systems which have been achieved in this research paradigm are all, of course, simpler than real attested languages, but nevertheless possess many of the central traits of natural language syntactic organisation, including recursivity, compositionality of meaning, asymmetric distribution of regular and irregular forms according to frequency, grammatical functional elements with no denotational meaning, grammatical markers of passive voice and of reflexivity, and elementary partitioning into phrases.

p. 56

computer simulations and invisible hand

  • summed independent actions if individuals
  • but: not intentionally constructed by any individual

Simulations within an ILM framework strip the interaction between individuals down to the bare minimum from which language-like systems can be shown to emerge. The key property of these models is that each new generation learns its language from a restricted set of exemplars produced by the preceding generation.

p. 219

The origin and subsequent evolution of language


Robin I.M. Dunbar


Why did language evolve?

p. 220

social hypothesis

  • we need language to knit together a large social group
  • (refutation against other hypotheses)
p. 222

The precursors of language

not relevant to me

p. 229

Why do languages diversify?

p. 230

linguistic diversification

  • why do new languages spawn at crazy rates?

1. drift

  • the gradual accumulation of accidental mutations over long periods of time
  • e.g. mispronunciations, unintended slippages of meaning …

2. need to differentiate communities

  • ‘freerider’ → the individual who takes the benefits of cooperations, but does not pay the costs
  • (Nettle 1999 simulation)
  • dialects are particularly well designed to act as badges of group membership that allow everyone to identify members of their exchange group
p. 232

Beyond this, however, we have little real understanding of the processes involved in either dialect change or language evolution, or for that matter in the functions that these processes subserve. We assume that these functions are largely social, and we have some understanding of the types of process that can precipitate language change (trade, colonisation, emulation of culturally or economically superior groups, etc.), but by and large there is little other than conjecture to explain why these processes exist or why they should work in the way they do.

Part of the problem is, of course, the temporal scale on which these changes occur (generations in the case of dialects, perhaps lmillennia in the case of languages). Inevitably, this makes it all but impossible for us to observe the process first-hand.

p. 272

From language learning to language evolution


Simon Kirby and Morten H. Christiansen


Introduction

animal communication human communication
mostly determined by genes learnt to a very high degree
  • ↳ details of human communication are stored in the environment
p. 273

↓ why?

size of human language

  • human language cannot fit into a genome

↳ language learning

  • an important aspect
  • but: also automatically leads to language variation

From universals to universal bias

p. 274

The environment that the learner interacts with consists of the ‘output’ of that learning by other individuals.

Constraints on variation


Universal features of language (which constrain it)


1. digital infinity

  • ‘small’ language inventory allows for an unlimited range of utterances

2. compositionality

  • the meaning of an utterance is a function of the meanings of parts of that utterance and the way they are put together
p. 275

3. typological properties

  • e.g. branching direction etc.

Acquisition as explanation

p. 277

object of study

  • the set of constraints and preferences that children have, and which are brought to bear on the task of language learning

Sequential learning


How do we explain universals that appear to be unique to language?


SRNs and language learning

SRN

  • simple recurrent neural network
  • (nvda. mighty impressive for 2003)
p. 279

SRNs and learning-based constraints

sequence-based learning

  • a SRN is trained on many different types of grammars (different combinations, with one difference every time)
p. 280

learnability score

  • the ability of a network to correctly predict continuation probabilities after being trained on a corpus
  • how well do each of the thirty-two grammars fir the prior bias of an SRN sequential learner?

↳ branching consistency

  • a typological property (verb word order structure and pre/postposition parallels)
  • greatly influences the learnability score! (Christiansen and Devlin 1997)

This is clearly a striking result. Why postulate a domain-specific constraint if the data that constraint should account for is predicted by a general model of sequential learning?

p. 281

This type of result was also found in an experimental setting (Christiansen and Ellefson 2002).

  • ↳ SRNs mirror learning biases of humans

What these experiments with ALL [(artificial language learning]) and SRNs show us is that we should be careful about ascribing universal properties of language to a domain-specific innate bias. We argue that an explanation that appeals to non-linguistic biases should be preferred where possible. Simplistic, all-or-nothing explanations should be avoided, however.

p. 282

Iterated learning and the origins of structure

p. 283

where do the data to learn from come from?

The ILM

feedback loop

  • the data that make up the input to learning are themselves the output of that same process
p. 284
p. 283

Iterated Learning Model (ILM)

  • a model of language evolution (Kirby and Hurford 2002)
  • a multi-agent model → treats populations as consisting of sets of individuals (agents)
  • each agent learns its behaviour by observing the behaviour of others (and consequently contributed to the experience of other agents’ learning)
p. 285

evolutionary model

  • dynamic behaviour
  • behaviour is not pre-determined, but emerges from the process of repeated use and acquisition from generation to generation
  • biological evolution → linguistic transmission

The origins of compositionality

compositionality

  • a unique property of human languages
  • combining signs to create complex meaning
p. 286
Computer modelling

computer modelling

  • ‘increasingly popular’ (see Kirby 2002 for a review)
  • gives us an easy way to uncover the relationship between the components of a complex system (individual learners) and the emergent outcomes of their interactions

Typical components of simulation models


1. a population of agents

2. a space of possible signals

  • usually strings of symbols

3. a space of possible meanings

  • usually some kind of structured representation, such that some meanings are more similar to each other than others

4. a production model

  • this determines how, when prompted with a meaning, an agent uses its knowledge of language to produce a signal

5. a learning model

  • the learning model defines how an individual agent acquires its knowledge of language from observing meaning-signal pairs produced by other agents

The fact that the learners are given meanings as well as signals seems unrealistic. Ultimately, simulations of the process of iterated learning will need to enrich the model with contexts. Whereas meanings are private and inaccessible to learners, contexts are public and may allow the inference of meanings. See e.g. Steels et al. (200) and Smith (2001) for discussion of these fascinating extensions to the model.

p. 287

↳ common property of ILM simulations

  • the language of the population persists only by virtue of its constant use and acquisition by individual agents

Composition of the simulation


initialisation of a simulation

  • random language at first
  • agents initially produce purely random strings of symbols for every meaning that they are prompted with
  • usually highly unstable

poverty of the stimulus

  • UG idea: learners are exposed to few input data
  • in the same way: complete coverage of all meanings cannot be assumed

stabilisation

  • eventually, some part of the language will stabilise
  • after this: passed on from generation to generation
  • time for stabilisation depends on particular population’s population dynamics

ideal distribution

  • more and more of the language increases in stability, until eventually signals corresponding to the entire meaning space are passed on reliably from generation to generation without the learner being exposed to the whole language
  • ⇒ final language uses a compositional system
p. 287-288

Explaining the behaviour of simulations

How are these languages stable, and why are they compositional?

The initial holistic languages are unstable by virtue of the poverty of the stimulus, which acts as a bottleneck on the transmission of language. In the early stages of the simulation, the population is essentially randomly searching around the space of possible meaning-string pairs, driven by the learner’s failure to generalise (non-randomly) to unheard meanings.

At some point a learner will infer some form of non-random behaviour in a speaker and use this to generalize to other meanings. In the first instance, this inference if ‘rule-like’ behaviour will actually be ill-founded (since the speaker will have been behaving randomly). Nevertheless, this learner will now produce utterances that are, to a small extent, non-random.

p. 288

chain of generalisation

  • languages generated by a learner who has generalised are themselves generalisable by other learners
  • ⇒ the aspects of language that are generalisable in this way are more stable
    • better attuned to the acquisition bottleneck

end result

  • movement towards a language made up of generalisations
  • ⇒ meanings with internal structure, signals with internal structure
p. 289

Why do some kinds of holism persist?


frequency effects

  • added in Kirby (2001)
  • some meanings turn up more frequently than others (built into the simulation)
  • result: a language that utilises both compositional structure and holistic expressions

↳ bottleneck explanation

  • frequently used expressions may be faithfully transmitted (even if they are idiosyncratic) because of their high frequency
  • infrequent expressions must form part of a larger paradigm

If, for reasons such as language contact of processes of phonological erosion, irregulars make their way into a language, the pressure to regularise them will be strongest in the low-frequency parts of the system.

An interesting by-product of the introduction of frequency biases to the meaning space is the removal of a fixed endpoint to the simulations. The language in this model is always changing – but not so much that speaker-to-speaker intelligibility is degraded. This is another way in which these simulation results seem to mirror what we know about language more accurately.

p. 290

Implications and conclusion

Population dynamics
horizontal transmission vertical transmission
learners mainly learn from adults a lot of contact between learners
language changes slowly language changes fast
may take many generations to stabilise on a structured system structure can emerge rapidly
p. 292

Three principal assumptions of IL models


  1. agents have structured representations of the world
  2. learners have some way of inferring the meaning of a particular signal
    • at least some of the time, they can mind read
  3. speakers are inclined to communicate about an open-ended range of topics

By systematically varying the representations of meanings to which the agents in the ILM have access, we are able to see under which circumstances structured mappings between meanings and signals will emerge.

biological evolution

  • also plays a part, but which part and how much?
p. 239